Document Image De-warping Based on Detection of Distorted Text Lines
Identifieur interne : 001312 ( Main/Exploration ); précédent : 001311; suivant : 001313Document Image De-warping Based on Detection of Distorted Text Lines
Auteurs : Lothar Mischke [Allemagne] ; Wolfram Luther [Allemagne]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2005.
Descripteurs français
- Pascal (Inist)
- Wicri :
- topic : Numérisation.
English descriptors
- KwdEn :
Abstract
Abstract: Image warping caused by scanning, photocopying or photographing a document is a common problem in the .eld of document processing and understanding. Distortion within the text documents impairs OCRability and thus strongly decreases the usability of the results. This is one of the major obstacles for automating the process of digitizing printed documents. In this paper we present a novel algorithm which is able to correct document image warping based on the detection of distorted text lines. The proposed solution is used in a recent project of digitizing old, poor quality manuscripts. The algorithm is compared to other published approaches. Experiments with various document samples and the resulting improvements of the text recognition rate achieved by a commercial OCR engine are also presented.
Url:
DOI: 10.1007/11553595_131
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000533
- to stream Istex, to step Curation: 000526
- to stream Istex, to step Checkpoint: 000C21
- to stream Main, to step Merge: 001348
- to stream PascalFrancis, to step Corpus: 000442
- to stream PascalFrancis, to step Curation: 000345
- to stream PascalFrancis, to step Checkpoint: 000414
- to stream Main, to step Merge: 001444
- to stream Main, to step Curation: 001312
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Document Image De-warping Based on Detection of Distorted Text Lines</title>
<author><name sortKey="Mischke, Lothar" sort="Mischke, Lothar" uniqKey="Mischke L" first="Lothar" last="Mischke">Lothar Mischke</name>
</author>
<author><name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11553595_131</idno>
<idno type="url">https://api.istex.fr/document/093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000533</idno>
<idno type="wicri:Area/Istex/Curation">000526</idno>
<idno type="wicri:Area/Istex/Checkpoint">000C21</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Mischke L:document:image:de</idno>
<idno type="wicri:Area/Main/Merge">001348</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:05-0420709</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000442</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000345</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000414</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Mischke L:document:image:de</idno>
<idno type="wicri:Area/Main/Merge">001444</idno>
<idno type="wicri:Area/Main/Curation">001312</idno>
<idno type="wicri:Area/Main/Exploration">001312</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Document Image De-warping Based on Detection of Distorted Text Lines</title>
<author><name sortKey="Mischke, Lothar" sort="Mischke, Lothar" uniqKey="Mischke L" first="Lothar" last="Mischke">Lothar Mischke</name>
<affiliation wicri:level="1"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Eduard Spranger Vocational School, Vorheider Weg 8, D-59067, Hamm</wicri:regionArea>
<wicri:noRegion>59067, Hamm</wicri:noRegion>
<wicri:noRegion>Hamm</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<affiliation wicri:level="1"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute of Computer Science and Interactive Systems, University of Duisburg–Essen, Lotharstr. 65, D-47048, Duisburg</wicri:regionArea>
<wicri:noRegion>47048, Duisburg</wicri:noRegion>
<wicri:noRegion>Duisburg</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Allemagne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1</idno>
<idno type="DOI">10.1007/11553595_131</idno>
<idno type="ChapterID">131</idno>
<idno type="ChapterID">Chap131</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Digitizing</term>
<term>Document processing</term>
<term>Image interpretation</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Printed character</term>
<term>Printed document</term>
<term>Text</term>
<term>Usability</term>
<term>Warping</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Caractère imprimé</term>
<term>Document imprimé</term>
<term>Gauchissement</term>
<term>Interprétation image</term>
<term>Numérisation</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Texte</term>
<term>Traitement document</term>
<term>Utilisabilité</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Numérisation</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Image warping caused by scanning, photocopying or photographing a document is a common problem in the .eld of document processing and understanding. Distortion within the text documents impairs OCRability and thus strongly decreases the usability of the results. This is one of the major obstacles for automating the process of digitizing printed documents. In this paper we present a novel algorithm which is able to correct document image warping based on the detection of distorted text lines. The proposed solution is used in a recent project of digitizing old, poor quality manuscripts. The algorithm is compared to other published approaches. Experiments with various document samples and the resulting improvements of the text recognition rate achieved by a commercial OCR engine are also presented.</div>
</front>
</TEI>
<affiliations><list><country><li>Allemagne</li>
</country>
</list>
<tree><country name="Allemagne"><noRegion><name sortKey="Mischke, Lothar" sort="Mischke, Lothar" uniqKey="Mischke L" first="Lothar" last="Mischke">Lothar Mischke</name>
</noRegion>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<name sortKey="Mischke, Lothar" sort="Mischke, Lothar" uniqKey="Mischke L" first="Lothar" last="Mischke">Lothar Mischke</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001312 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001312 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:093C2B6E6B98D0B3A0122B40D31AFB8A26AD5ED1 |texte= Document Image De-warping Based on Detection of Distorted Text Lines }}
This area was generated with Dilib version V0.6.32. |